SGDR: Stochastic Gradient Descent with Warm Restarts

نویسندگان

  • Ilya Loshchilov
  • Frank Hutter
چکیده

Restart techniques are common in gradient-free optimization to deal with multimodal functions. Partial restarts are also gaining popularity in gradient-based optimization to improve the rate of convergence in accelerated gradient schemes to deal with ill-conditioned functions. In this paper, we propose a simple restart technique for stochastic gradient descent to improve its anytime performance when training deep neural networks. We empirically study its performance on CIFAR10 and CIFAR-100 datasets where we demonstrate new state-of-the-art results below 4% and 19%, respectively. Our source code is available at https://github.com/loshchil/SGDR.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fixing Weight Decay Regularization in Adam

L2 regularization and weight decay regularization are equivalent for standard stochastic gradient descent (when rescaled by the learning rate), but as we demonstrate this is not the case for adaptive gradient algorithms, such as Adam. While common deep learning frameworks of these algorithms implement L2 regularization (often calling it “weight decay” in what may be misleading due to the inequi...

متن کامل

CSE 599 i : Online and Adaptive Machine Learning Winter 2018 Lecture 6 : Non - stochastic best arm identification

Example 1. Imagine that we are solving a non-convex optimization problem on some (multivariate) function f using gradient descent. Recall that gradient descent converges to local minima. Because non-convex functions may have multiple minima, we cannot guarantee that gradient descent will converge to the global minimum. To resolve this issue, we will use random restarts, the process of starting ...

متن کامل

Identification of Multiple Input-multiple Output Non-linear System Cement Rotary Kiln using Stochastic Gradient-based Rough-neural Network

Because of the existing interactions among the variables of a multiple input-multiple output (MIMO) nonlinear system, its identification is a difficult task, particularly in the presence of uncertainties. Cement rotary kiln (CRK) is a MIMO nonlinear system in the cement factory with a complicated mechanism and uncertain disturbances. The identification of CRK is very important for different pur...

متن کامل

Texture analysis with the Volterra model using conjugate gradient optimisation

Texture is an important characteristic di erentiating objects or regions of interest in an image. A variety of approaches to texture analysis has previously been proposed; the approach in this paper is from the stochastic eld, whereby a two-dimensional Volterra model is used to represent a texture eld. Statistical moments are introduced as a means of determining the Volterra model generator for...

متن کامل

Primal-Dual Projected Gradient Algorithms for Extended Linear-Quadratic Programming

Many large-scale problems in dynamic and stochastic optimization can be modeled with extended linear-quadratic programming, which admits penalty terms and treats them through duality. In general the objective functions in such problems are only piecewise smooth and must be minimized or maximized relative to polyhedral sets of high dimensionality. This paper proposes a new class of numerical met...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016